Reproducing Spectre Attack with gem5: How To Do It Right?



Pierre Ayoub - EURECOM

pierre.ayoub@eurecom.fr


Clémentine Maurice - Univ Lille, CNRS, Inria

clementine.maurice@inria.fr


EuroSec'21


26 April, 2021

Introduction

Whoami

Intern
Ph.D. Student

About This Presentation

The initial idea

  • Can we simulate transient execution attacks? \(\Rightarrow\) Spectre.
  • Why?
    • Rely on tiny CPU details to work \(\Rightarrow\) difficult to simulate.
    • Hard to study and reproduce \(\Rightarrow\) simulation could be helping.
  • Comparing a real system and a simulated system:
    • Raspberry Pi, ARM processor.
    • gem5, micro-architectural simulator.
  • Goals:
    • Attack that works similarly on both systems.
    • Compare the faithfulness of the simulation.
    • Discover how gem5 could be helpful.

What really happened

Problems

  1. Available Spectre implementations failed on our Raspberry Pi \(\Rightarrow\) guidelines and custom implementation.
  2. Reproducing a micro-architecture is impossible.
  3. gem5 needed some extension to compare it to the real system \(\Rightarrow\) patch.

Contributions

  • Guidelines that are important for micro-architectural security research.
  • Usage of gem5 for helping attack development and understanding.
  • Simulation of Spectre and evaluation of faithfulness.
  • Requirements of gem5 to simulate those attacks.

Table of Content

Related Work

  • No specific literature about simulation of micro-architectural attack.

J. Lowe-Power - Visualizing Spectre with gem5






  • Blog post
  • x86
  • Default gem5 configuration
  • We wanted to go deeper!

The Spectre Attack

A Transient Instruction?

Summary
  • An instruction has been transiently executed if it affects the CPU micro-architectural state – leaving its architectural state as prior the execution.
  • If the the new micro-architectural state depends on a secret and the attacker is able to probe it, he can recover the secret.

The Branch Predictor

  • Predict instruction flow when branches are encountered.
  • Prediction is dynamic, it is based on previous execution.

The PHT, A Structure Used by the Branch Predictor

How Does Spectre Work?

We Need A Target

void victim_function(int x)
{
    ...
    if (x < array1_size)
        y = array2[array1[x]];
    ...
}
  • If x is malicious, array1[x] is the secret value!

Steps Of The Attack

Summary
Details

Guidelines: How To Develop and Reproduce the Attack

  • Why?
    • Hard to reproduce the attack using already existing implementation.
    • Hard to develop a functional attack on a vulnerable processor.
  • Refer to the paper for more details.

Development

Compiler version, compiler and manual optimizations
Timer for covert channel
Prefetcher, re-ordering
  • Very tricky because we make blind assumptions.
  • gem5 can help resolve this!
Transient execution window
  • Time during which transient executions can happen.
  • e.g. if (array1_size < 1) vs. if ((float) x / (float) array1_size < 1)

Our implementation is also full of tips in the comments, feel free to look at it!

Reproducibility

  • Pinning
  • Page size
  • Frequency
  • Mitigations

The gem5 Simulator

gem5

  • Micro-architectural simulator, cycle-accurate.
  • State-of-the-art project, started in 2011.
Parts
  • C++ core where logic is programmed.
  • Python interface where systems are built.
Architecture
Alpha, ARM, Power, SPARC, x86, MIPS, RISC-V.
Generic Micro-Architecture
Very simple ones to a 7-stage out-of-order pipelined processor.
Branch Prediction
Bi-Mode, TAGE, Two-Level, Perceptron, Tournament…

How To Use It?

Building a Simulated System (Snippets)

size = '32kB'
assoc = 2
data_latency = 1
mshrs = 4
tgts_per_mshr = 8
write_buffers = 4
prefetcher = StridePrefetcher(queue_size=4, degree=4)
for cpu in self.cpus:
    cpu.createThreads()
    cpu.createInterruptController()
    cpu.branchPredAdd()
if system.getMemoryMode() == "timing":
    self.cacheAddL1()
    self.cacheAddL2()
cpu.dtb.walker.port = bus.slave
cpu.itb.walker.port = bus.slave
cpu.dcache_port = bus.slave
cpu.icache_port = bus.slave
kernel_cmd = [
    "console=ttyAMA0",
    "root=/dev/vda1",
    "rw",
    "mem=2G@0x80000000",
]

Launching a Simulation

./build/ARM/gem5.opt ./configs/example/arm/starter_fs.py --num-cores=4 --disk-image="aarch64-ubuntu.img" --kernel="vmlinux.arm64"
gem5/util/term/m5term localhost 3456
==== m5 slave terminal: Terminal 0 ====
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd070]
[    0.000000] Linux version 4.18.0+ (arm-employee@arm-computer) (gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1))
[    0.000000] Machine model: V2P-CA15
[    0.000000] Memory limited to 2048MB
...
...
[    0.256634] random: init: uninitialized urandom read (12 bytes read)
[    0.271877] init: hwclock main process (684) terminated with status 1
[    0.286689] random: mountall: uninitialized urandom read (12 bytes read)

Ubuntu 14.04 LTS aarch64-gem5 ttyAMA0

aarch64-gem5 login: root

Welcome to Ubuntu 14.04 LTS (GNU/Linux 4.18.0+ aarch64)

root@aarch64-gem5:~#

Benefits of (Pipeline) Visualization

  • gem5 \(\Rightarrow\) output the state of any element in the system.
  • Konata \(\Rightarrow\) graphically visualize the pipeline of a simulated processor.

Transient execution of a read instruction with a malicious index

Branch predictor being trained

Spectre defeated by the branch predictor

Implementing…

…the Spectre Attack on ARM

  • IAIK implementation failed to perform the attack successfully.
  • Needed an implementation with the following requirements:
    1. Stable results,
    2. Follows our guidelines,
    3. Usable both on the Raspberry Pi and on gem5,
    4. Metrics output.
Steps

…an ARM gem5 System

  • Steps:
    1. Syscall emulation system
    2. Caches
    3. Branch predictor \(\Rightarrow\) Spectre working
    4. Full-system simulation
    5. Patch for perf_event \(\Rightarrow\) measurements working
  • Getting closer of the ARM Cortex-A72 of the Raspberry Pi.

Details of the gem5 System

Configuration of the Branch Predictor

Considering Spectre, Both Predictors Are Equivalent

Is The Simulation Faithful?

  • A simulation not faithful will not be so useful
  • Measurements of metrics:
    1. Retrieved bytes: Similar
    2. Iterations: Two times easier on gem5
    3. Cycles: Three time faster on gem5
    4. Cache misses: Aberrant result
    5. Mispredicted branches: Similar

Conclusion

  • If simulation becomes widely used:
    • Easier to reproduce older attacks for understanding and experimentation.
    • With faithful models, researchers could use the simulator itself to discover new vulnerabilities.
  • But…
    • Simulation is currently slow.
    • Simulator still needs improvements and extensions.
  • In summary:
    • Possible to simulate micro-architectural attacks and being accurate.
    • Visualization is a very powerful technique to understand the micro-architectural behavior.

Website

https://pierreay.github.io/reproduce-spectre-gem5/

Questions?

pierre.ayoub@eurecom.fr

Appendices

The Microarchitectural Domain

Flush+Reload
Often used to probe the cache state.

Simulation Modes

Syscall Emulation
gem5 plays the role of the operating system, as it emulates every system calls of a binary over the simulated hardware.
Full-System Simulation
gem5 runs an entire operating system over the simulated hardware.
Baremetal
gem5 runs native assembly code over the simulated hardware, without any operating system layer.

Results

Table 1: Ratio between gem5 and Raspberry Pi runs for each metric. A value below 1 means that gem5's metric is lower than the Raspberry Pi's metric.
  Accuracy Ratio Accuracy Ratio
Metric Mean Standard Deviation
Retrieved Bytes 1.05 NaN
Iterations 0.57 3.81
Cycles 0.31 2.12
Cache Misses 584.08 4581.02
Mispredicted Branches 0.99 2.41